Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Sci Adv ; 10(18): eadk3452, 2024 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-38691601

RESUMEN

Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear recommendations for conducting and reporting ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist (recommendations for machine-learning-based science). It consists of 32 questions and a paired set of guidelines. REFORMS was developed on the basis of a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.


Asunto(s)
Consenso , Aprendizaje Automático , Humanos , Reproducibilidad de los Resultados , Ciencia
2.
Nat Hum Behav ; 7(4): 478-479, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36759587
3.
Elife ; 102021 11 09.
Artículo en Inglés | MEDLINE | ID: mdl-34751133

RESUMEN

Any large dataset can be analyzed in a number of ways, and it is possible that the use of different analysis strategies will lead to different results and conclusions. One way to assess whether the results obtained depend on the analysis strategy chosen is to employ multiple analysts and leave each of them free to follow their own approach. Here, we present consensus-based guidance for conducting and reporting such multi-analyst studies, and we discuss how broader adoption of the multi-analyst approach has the potential to strengthen the robustness of results and conclusions obtained from analyses of datasets in basic and applied research.


Asunto(s)
Consenso , Análisis de Datos , Conjuntos de Datos como Asunto , Investigación
4.
Nature ; 595(7866): 181-188, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34194044

RESUMEN

Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated. Towards this end we make two contributions. The first is a schema for thinking about research activities along two dimensions-the extent to which work is explanatory, focusing on identifying and estimating causal effects, and the degree of consideration given to testing predictions of outcomes-and how these two priorities can complement, rather than compete with, one another. Our second contribution is to advocate that computational social scientists devote more attention to combining prediction and explanation, which we call integrative modelling, and to outline some practical suggestions for realizing this goal.


Asunto(s)
Simulación por Computador , Ciencia de los Datos/métodos , Predicción/métodos , Modelos Teóricos , Ciencias Sociales/métodos , Objetivos , Humanos
6.
Proc Natl Acad Sci U S A ; 117(15): 8398-8403, 2020 04 14.
Artículo en Inglés | MEDLINE | ID: mdl-32229555

RESUMEN

How predictable are life trajectories? We investigated this question with a scientific mass collaboration using the common task method; 160 teams built predictive models for six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. Despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate and were only slightly better than those from a simple benchmark model. Within each outcome, prediction error was strongly associated with the family being predicted and weakly associated with the technique used to generate the prediction. Overall, these results suggest practical limits to the predictability of life outcomes in some settings and illustrate the value of mass collaborations in the social sciences.


Asunto(s)
Ciencias Sociales/normas , Adolescente , Niño , Preescolar , Estudios de Cohortes , Familia , Femenino , Humanos , Lactante , Vida , Aprendizaje Automático , Masculino , Valor Predictivo de las Pruebas , Ciencias Sociales/métodos , Ciencias Sociales/estadística & datos numéricos
7.
Socius ; 52019.
Artículo en Inglés | MEDLINE | ID: mdl-37214352

RESUMEN

Researchers rely on metadata systems to prepare data for analysis. As the complexity of data sets increases and the breadth of data analysis practices grow, existing metadata systems can limit the efficiency and quality of data preparation. This article describes the redesign of a metadata system supporting the Fragile Families and Child Wellbeing Study on the basis of the experiences of participants in the Fragile Families Challenge. The authors demonstrate how treating metadata as data (i.e., releasing comprehensive information about variables in a format amenable to both automated and manual processing) can make the task of data preparation less arduous and less error prone for all types of data analysis. The authors hope that their work will facilitate new applications of machine-learning methods to longitudinal surveys and inspire research on data preparation in the social sciences. The authors have open-sourced the tools they created so that others can use and improve them.

8.
Socius ; 52019.
Artículo en Inglés | MEDLINE | ID: mdl-37309412

RESUMEN

The Fragile Families Challenge is a scientific mass collaboration designed to measure and understand the predictability of life trajectories. Participants in the Challenge created predictive models of six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. This Special Collection includes 12 articles describing participants' approaches to predicting these six outcomes as well as 3 articles describing methodological and procedural insights from running the Challenge. This introduction will help readers interpret the individual articles and help researchers interested in running future projects similar to the Fragile Families Challenge.

9.
Socius ; 52019.
Artículo en Inglés | MEDLINE | ID: mdl-37309413

RESUMEN

Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results of a published study using the original author's raw data and code. Although most people agree that computational reproducibility is important, it is still difficult to achieve in practice. In this article, the authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge. The approach draws on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers (e.g., Docker) and cloud computing (e.g., Amazon Web Services). These tools made it possible to standardize the computing environment around each submission, which will ease computational reproducibility both today and in the future. Drawing on their successes and struggles, the authors conclude with recommendations to researchers and journals.

10.
Socius ; 52019.
Artículo en Inglés | MEDLINE | ID: mdl-37347012

RESUMEN

Stewards of social data face a fundamental tension. On one hand, they want to make their data accessible to as many researchers as possible to facilitate new discoveries. At the same time, they want to restrict access to their data as much as possible to protect the people represented in the data. In this article, we provide a case study addressing this common tension in an uncommon setting: the Fragile Families Challenge, a scientific mass collaboration designed to yield insights that could improve the lives of disadvantaged children in the United States. We describe our process of threat modeling, threat mitigation, and third-party guidance. We also describe the ethical principles that formed the basis of our process. We are open about our process and the trade-offs we made in the hope that others can improve on what we have done.

11.
Demography ; 54(4): 1503-1528, 2017 08.
Artículo en Inglés | MEDLINE | ID: mdl-28741073

RESUMEN

Adult death rates are a critical indicator of population health and well-being. Wealthy countries have high-quality vital registration systems, but poor countries lack this infrastructure and must rely on estimates that are often problematic. In this article, we introduce the network survival method, a new approach for estimating adult death rates. We derive the precise conditions under which it produces consistent and unbiased estimates. Further, we develop an analytical framework for sensitivity analysis. To assess the performance of the network survival method in a realistic setting, we conducted a nationally representative survey experiment in Rwanda (n = 4,669). Network survival estimates were similar to estimates from other methods, even though the network survival estimates were made with substantially smaller samples and are based entirely on data from Rwanda, with no need for model life tables or pooling of data from other countries. Our analytic results demonstrate that the network survival method has attractive properties, and our empirical results show that this method can be used in countries where reliable estimates of adult death rates are sorely needed.


Asunto(s)
Encuestas Epidemiológicas/métodos , Modelos Estadísticos , Mortalidad/tendencias , Apoyo Social , Adolescente , Adulto , Femenino , Encuestas Epidemiológicas/normas , Humanos , Entrevistas como Asunto , Persona de Mediana Edad , Reproducibilidad de los Resultados , Rwanda/epidemiología , Factores Socioeconómicos , Adulto Joven
12.
Am J Epidemiol ; 183(8): 747-57, 2016 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-27015875

RESUMEN

The network scale-up method is a promising technique that uses sampled social network data to estimate the sizes of epidemiologically important hidden populations, such as sex workers and people who inject illicit drugs. Although previous scale-up research has focused exclusively on networks of acquaintances, we show that the type of personal network about which survey respondents are asked to report is a potentially crucial parameter that researchers are free to vary. This generalization leads to a method that is more flexible and potentially more accurate. In 2011, we conducted a large, nationally representative survey experiment in Rwanda that randomized respondents to report about one of 2 different personal networks. Our results showed that asking respondents for less information can, somewhat surprisingly, produce more accurate size estimates. We also estimated the sizes of 4 key populations at risk for human immunodeficiency virus infection in Rwanda. Our estimates were higher than earlier estimates from Rwanda but lower than international benchmarks. Finally, in this article we develop a new sensitivity analysis framework and use it to assess the possible biases in our estimates. Our design can be customized and extended for other settings, enabling researchers to continue to improve the network scale-up method.


Asunto(s)
Consumidores de Drogas/estadística & datos numéricos , Infecciones por VIH/epidemiología , Homosexualidad Masculina/estadística & datos numéricos , Trabajadores Sexuales/estadística & datos numéricos , Medio Social , Red Social , Abuso de Sustancias por Vía Intravenosa/epidemiología , Métodos Epidemiológicos , Femenino , Infecciones por VIH/etiología , Humanos , Masculino , Medición de Riesgo/métodos , Rwanda/epidemiología , Abuso de Sustancias por Vía Intravenosa/complicaciones , Encuestas y Cuestionarios
13.
Sociol Methodol ; 46(1): 153-186, 2016 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-29375167

RESUMEN

The network scale-up method enables researchers to estimate the size of hidden populations, such as drug injectors and sex workers, using sampled social network data. The basic scale-up estimator offers advantages over other size estimation techniques, but it depends on problematic modeling assumptions. We propose a new generalized scale-up estimator that can be used in settings with non-random social mixing and imperfect awareness about membership in the hidden population. Further, the new estimator can be used when data are collected via complex sample designs and from incomplete sampling frames. However, the generalized scale-up estimator also requires data from two samples: one from the frame population and one from the hidden population. In some situations these data from the hidden population can be collected by adding a small number of questions to already planned studies. For other situations, we develop interpretable adjustment factors that can be applied to the basic scale-up estimator. We conclude with practical recommendations for the design and analysis of future studies.

14.
J Clin Epidemiol ; 68(12): 1463-71, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26112433

RESUMEN

OBJECTIVES: Respondent-driven sampling (RDS) is a new data collection methodology used to estimate characteristics of hard-to-reach groups, such as the HIV prevalence in drug users. Many national public health systems and international organizations rely on RDS data. However, RDS reporting quality and available reporting guidelines are inadequate. We carried out a systematic review of RDS studies and present Strengthening the Reporting of Observational Studies in Epidemiology for RDS Studies (STROBE-RDS), a checklist of essential items to present in RDS publications, justified by an explanation and elaboration document. STUDY DESIGN AND SETTING: We searched the MEDLINE (1970-2013), EMBASE (1974-2013), and Global Health (1910-2013) databases to assess the number and geographical distribution of published RDS studies. STROBE-RDS was developed based on STROBE guidelines, following Guidance for Developers of Health Research Reporting Guidelines. RESULTS: RDS has been used in over 460 studies from 69 countries, including the USA (151 studies), China (70), and India (32). STROBE-RDS includes modifications to 12 of the 22 items on the STROBE checklist. The two key areas that required modification concerned the selection of participants and statistical analysis of the sample. CONCLUSION: STROBE-RDS seeks to enhance the transparency and utility of research using RDS. If widely adopted, STROBE-RDS should improve global infectious diseases public health decision making.


Asunto(s)
Diseño de Investigaciones Epidemiológicas , Estudios Observacionales como Asunto , Proyectos de Investigación , Muestreo , Encuestas y Cuestionarios , Humanos
15.
PLoS One ; 10(5): e0123483, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25992565

RESUMEN

In the social sciences, there is a longstanding tension between data collection methods that facilitate quantification and those that are open to unanticipated information. Advances in technology now enable new, hybrid methods that combine some of the benefits of both approaches. Drawing inspiration from online information aggregation systems like Wikipedia and from traditional survey research, we propose a new class of research instruments called wiki surveys. Just as Wikipedia evolves over time based on contributions from participants, we envision an evolving survey driven by contributions from respondents. We develop three general principles that underlie wiki surveys: they should be greedy, collaborative, and adaptive. Building on these principles, we develop methods for data collection and data analysis for one type of wiki survey, a pairwise wiki survey. Using two proof-of-concept case studies involving our free and open-source website www.allourideas.org, we show that pairwise wiki surveys can yield insights that would be difficult to obtain with other methods.


Asunto(s)
Recolección de Datos/métodos , Internet/estadística & datos numéricos , Ciencias Sociales , Encuestas y Cuestionarios , Humanos
16.
J R Stat Soc Ser A Stat Soc ; 178(1): 241-269, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-27226702

RESUMEN

Respondent-driven sampling (RDS) is a widely used method for sampling from hard-to-reach human populations, especially populations at higher risk for HIV. Data are collected through peer-referral over social networks. RDS has proven practical for data collection in many difficult settings and is widely used. Inference from RDS data requires many strong assumptions because the sampling design is partially beyond the control of the researcher and partially unobserved. We introduce diagnostic tools for most of these assumptions and apply them in 12 high risk populations. These diagnostics empower researchers to better understand their data and encourage future statistical research on RDS.

17.
Epidemiology ; 23(1): 148-50, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22157310

Asunto(s)
Muestreo , Humanos , Masculino
18.
Am J Epidemiol ; 174(10): 1190-6, 2011 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-22003188

RESUMEN

One of the many challenges hindering the global response to the human immunodeficiency virus (HIV)/acquired immunodeficiency syndrome (AIDS) epidemic is the difficulty of collecting reliable information about the populations most at risk for the disease. Thus, the authors empirically assessed a promising new method for estimating the sizes of most at-risk populations: the network scale-up method. Using 4 different data sources, 2 of which were from other researchers, the authors produced 5 estimates of the number of heavy drug users in Curitiba, Brazil. The authors found that the network scale-up and generalized network scale-up estimators produced estimates 5-10 times higher than estimates made using standard methods (the multiplier method and the direct estimation method using data from 2004 and 2010). Given that equally plausible methods produced such a wide range of results, the authors recommend that additional studies be undertaken to compare estimates based on the scale-up method with those made using other methods. If scale-up-based methods routinely produce higher estimates, this would suggest that scale-up-based methods are inappropriate for populations most at risk of HIV/AIDS or that standard methods may tend to underestimate the sizes of these populations.


Asunto(s)
Diseño de Investigaciones Epidemiológicas , Infecciones por VIH/epidemiología , Abuso de Sustancias por Vía Intravenosa/epidemiología , Síndrome de Inmunodeficiencia Adquirida/epidemiología , Síndrome de Inmunodeficiencia Adquirida/etiología , Brasil/epidemiología , Infecciones por VIH/etiología , Humanos , Prevalencia , Medición de Riesgo , Abuso de Sustancias por Vía Intravenosa/complicaciones
19.
Soc Networks ; 33(1): 70-78, 2011 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-21318126

RESUMEN

Estimating the sizes of hard-to-count populations is a challenging and important problem that occurs frequently in social science, public health, and public policy. This problem is particularly pressing in HIV/AIDS research because estimates of the sizes of the most at-risk populations-illicit drug users, men who have sex with men, and sex workers-are needed for designing, evaluating, and funding programs to curb the spread of the disease. A promising new approach in this area is the network scale-up method, which uses information about the personal networks of respondents to make population size estimates. However, if the target population has low social visibility, as is likely to be the case in HIV/AIDS research, scale-up estimates will be too low. In this paper we develop a game-like activity that we call the game of contacts in order to estimate the social visibility of groups, and report results from a study of heavy drug users in Curitiba, Brazil (n = 294). The game produced estimates of social visibility that were consistent with qualitative expectations but of surprising magnitude. Further, a number of checks suggest that the data are high-quality. While motivated by the specific problem of population size estimation, our method could be used by researchers more broadly and adds to long-standing efforts to combine the richness of social network analysis with the power and scale of sample surveys.

20.
Sex Transm Infect ; 86 Suppl 2: ii11-5, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21106509

RESUMEN

Estimating sizes of hidden or hard-to-reach populations is an important problem in public health. For example, estimates of the sizes of populations at highest risk for HIV and AIDS are needed for designing, evaluating and allocating funding for treatment and prevention programmes. A promising approach to size estimation, relatively new to public health, is the network scale-up method (NSUM), involving two steps: estimating the personal network size of the members of a random sample of a total population and, with this information, estimating the number of members of a hidden subpopulation of the total population. We describe the method, including two approaches to estimating personal network sizes (summation and known population). We discuss the strengths and weaknesses of each approach and provide examples of international applications of the NSUM in public health. We conclude with recommendations for future research and evaluation.


Asunto(s)
Recolección de Datos/métodos , Salud Pública/estadística & datos numéricos , Humanos , Medición de Riesgo , Tamaño de la Muestra
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...